Goto

Collaborating Authors

 Boca Raton


Extraction of informative statistical features in the problem of forecasting time series generated by It{ô}-type processes

Korolev, Victor, Ivanov, Mikhail, Kukanova, Tatiana, Rukavitsa, Artyom, Vakshin, Alexander, Solomonov, Peter, Zeifman, Alexander

arXiv.org Machine Learning

In this paper, we consider the problem of extraction of most informative features from time series that are regarded as observed values of stochastic processes satisfying the It{ô} stochastic differential equations with unknown random drift and diffusion coefficients. We do not attract any additional information and use only the information contained in the time series as it is. Therefore, as additional features, we use the parameters of statistically adjusted mixture-type models of the observed regularities of the behavior of the time series. Several algorithms of construction of these parameters are discussed. These algorithms are based on statistical reconstruction of the coefficients which, in turn, is based on statistical separation of normal mixtures. We obtain two types of parameters by the techniques of the uniform and non-uniform statistical reconstruction of the coefficients of the underlying It{ô} process. The reconstructed coefficients obtained by uniform techniques do not depend on the current value of the process, while the non-uniform techniques reconstruct the coefficients with the account of their dependence on the value of the process. Actually, the non-uniform techniques used in this paper represent a stochastic analog of the Taylor expansion for the time series. The efficiency of the obtained additional features is compared by using them in the autoregressive algorithms of prediction of time series. In order to obtain pure conclusion that is not affected by unwanted factors, say, related to a special choice of the architecture of the neural network prediction methods, we used only simple autoregressive algorithms. We show that the use of additional statistical features improves the prediction.


PRIM-cipal components analysis

Liu, Tianhao, Díaz-Pachón, Daniel Andrés, Rao, J. Sunil

arXiv.org Machine Learning

EVEN supervised learning is subject to the famous NoFree Lunch Theorems [1]-[3], which say that, in combinatorial optimization, there is no universal algorithm that works better than its competitors for every objective function [4]-[6]. Indeed, David Wolpert has recently proven that, on average, cross-validation performs as well as anti-crossvalidation (choosing among a set of candidate algorithms based on which has the worst out-of-sample behavior) for supervised learning. Still, he acknowledges that "it is hard to imagine any scientist who would not prefer to use [crossvalidation] to using anti-cross-validation" [7]. On the other hand, unsupervised learning has seldom been studied from the perspective of the NFLTs. This may be because the adjective "unsupervised" suggests that no human input is needed, which is misleading as many unsupervised tasks are combinatorial optimization problems that depend on the choice of the objective function. For instance, it is well known that, among the eigenvectors of the covariance matrix, Principal Components Analysis selects those with the largest variances [8]. However, mode-hunting techniques that rely on spectral manipulation aim at the opposite objective: selecting the eigenvectors of the covariance matrix with the smallest variances [9], [10]. Therefore, unlike in supervised learning, where it is difficult to identify reasons to optimize with respect to anti-cross-validation, in unsupervised learning there are strong reasons to reduce dimensionality for variance minimization. D. A. D ıaz-Pach on and T. Liu are with the Division of Biostatistics, University of Miami, Miami, FL, 33136 USA (e-mail: ddiaz3@miami.edu,


Cost-optimal Sequential Testing via Doubly Robust Q-learning

Zhou, Doudou, Zhang, Yiran, Jin, Dian, Zheng, Yingye, Tian, Lu, Cai, Tianxi

arXiv.org Machine Learning

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.


Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction

Brenning, Alexander, Suesse, Thomas

arXiv.org Machine Learning

Cross-validation (CV) is commonly used to estimate predictive risk when independent test data are unavailable. Its validity depends on the assumption that validation tasks are sampled from the same distribution as prediction tasks encountered during deployment. In spatial prediction and other settings with structured data, this assumption is frequently violated, leading to biased estimates of deployment risk. We propose Target-Weighted CV (TWCV), an estimator of deployment risk that accounts for discrepancies between validation and deployment task distributions, thus accounting for (1) covariate shift and (2) task-difficulty shift. We characterize prediction tasks by descriptors such as covariates and spatial configuration. TWCV assigns weights to validation losses such that the weighted empirical distribution of validation tasks matches the corresponding distribution over a target domain. The weights are obtained via calibration weighting, yielding an importance-weighted estimator that targets deployment risk. Since TWCV requires adequate coverage of the deployment distribution's support, we combine it with spatially buffered resampling that diversifies the task difficulty distribution. In a simulation study, conventional as well as spatial estimators exhibit substantial bias depending on sampling, whereas buffered TWCV remains approximately unbiased across scenarios. A case study in environmental pollution mapping further confirms that discrepancies between validation and deployment task distributions can affect performance assessment, and that buffered TWCV better reflects the prediction task over the target domain. These results establish task distribution mismatch as a primary source of CV bias in spatial prediction and show that calibration weighting combined with a suitable validation task generator provides a viable approach to estimating predictive risk under dataset shift.


Topological Detection of Hopf Bifurcations via Persistent Homology: A Functional Criterion from Time Series

Barrios, Jhonathan, Echávez, Yásser, Álvarez, Carlos F.

arXiv.org Machine Learning

We propose a topological framework for the detection of Hopf bifurcations directly from time series, based on persistent homology applied to phase space reconstructions via Takens embedding within the framework of Topological Data Analysis. The central idea is that changes in the dynamical regime are reflected in the emergence or disappearance of a dominant one-dimensional homological features in the reconstructed attractor. To quantify this behavior, we introduce a simple and interpretable scalar topological functional defined as the maximum persistence of homology classes in dimension one. This functional is used to construct a computable criterion for identifying critical parameters in families of dynamical systems without requiring knowledge of the underlying equations. The proposed approach is validated on representative systems of increasing complexity, showing consistent detection of the bifurcation point. The results support the interpretation of dynamical transitions as topological phase transitions and demonstrate the potential of topological data analysis as a model-free tool for the quantitative analysis of nonlinear time series.


Complete Causal Identification from Ancestral Graphs under Selection Bias

Chen, Leihao, Mooij, Joris M.

arXiv.org Machine Learning

Many causal discovery algorithms, including the celebrated FCI algorithm, output a Partial Ancestral Graph (PAG). PAGs serve as an abstract graphical representation of the underlying causal structure, modeled by directed acyclic graphs with latent and selection variables. This paper develops a characterization of the set of extended-type conditional independence relations that are invariant across all causal models represented by a PAG. This theory allows us to formulate a general measure-theoretic version of Pearl's causal calculus and a sound and complete identification algorithm for PAGs under selection bias. Our results also apply when PAGs are learned by certain algorithms that integrate observational data with experimental data and incorporate background knowledge.


A two-step sequential approach for hyperparameter selection in finite context models

Contente, José, Martins, Ana, Pinho, Armando J., Gouveia, Sónia

arXiv.org Machine Learning

Finite-context models (FCMs) are widely used for compressing symbolic sequences such as DNA, where predictive performance depends critically on the context length k and smoothing parameter α. In practice, these hyperparameters are typically selected through exhaustive search, which is computationally expensive and scales poorly with model complexity. This paper proposes a statistically grounded two-step sequential approach for efficient hyperparameter selection in FCMs. The key idea is to decompose the joint optimization problem into two independent stages. First, the context length k is estimated using categorical serial dependence measures, including Cramér's ν, Cohen's \k{appa} and partial mutual information (pami). Second, the smoothing parameter α is estimated via maximum likelihood conditional on the selected context length k. Simulation experiments were conducted on synthetic symbolic sequences generated by FCMs across multiple (k, α) configurations, considering a four-letter alphabet and different sample sizes. Results show that the dependence measures are substantially more sensitive to variations in k than in α, supporting the sequential estimation strategy. As expected, the accuracy of the hyperparameter estimation improves with increasing sample size. Furthermore, the proposed method achieves compression performance comparable to exhaustive grid search in terms of average bitrate (bits per symbol), while substantially reducing computational cost. Overall, the results on simulated data show that the proposed sequential approach is a practical and computationally efficient alternative to exhaustive hyperparameter tuning in FCMs.


Agile Interception of a Flying Target using Competitive Reinforcement Learning

Gavin, Timothée, Lacroix, Simon, Bronz, Murat

arXiv.org Machine Learning

The interception of agile aerial targets using autonomous drones is a challenging and increasingly relevant problem in robotics and security. The increasing presence of unmanned aerial vehicles (UAVs) in unauthorized, restricted airspaces poses significant safety and security risks and has spurred interest in developing effective interception strategies [1] In particular, scenarios such as airspace protection, infrastructure security, and event safety require the ability to capture or neutralize unauthorized drones with high precision and minimal collateral risk. Deploying interceptor drones equipped with nets is apromising approach, but it demandsadvanced control capabilities to match or exceed the agility of evasive targets. Traditional interception methods often rely on accurate models, preplanned strategies, or predictable target behaviour [2]. However, modern quadrotor drones can perform highly dynamic manoeuvres, and will actively evade capture, rendering their trajectories unpredictable and challenging the effectiveness of classical methods [3].



Addressing Hidden Confounding with Heterogeneous Observational Datasets for Recommendation

Neural Information Processing Systems

The collected data in recommender systems generally suffers selection bias. Considerable works are proposed to address selection bias induced by observed user and item features, but they fail when hidden features (e.g., user age or salary) that affect